This notebook investigates the data loss from enforcing that the nr. of tests in a given geographical units does not exceed the number of kids.

source("../../init.R")
source("load_and_preprocess.R")
library(tidyverse)

Load a state and inspect tested/kids ratio.

plot_tests_p_kid_tract <- function(state_abbr) {
  oh_merged <- single_state_tract(state_abbr, drop_if_multiple_testing_bool = FALSE)
  
  # Calculate total number and proportion of cases with tests_p_kid > 1
  total_cases <- sum(oh_merged$tests_p_kid > 1, na.rm = TRUE)
  proportion_cases <- total_cases / nrow(oh_merged)
  
  # Plot tests_p_kid for values above 1 and add total number and proportion of cases in title
  oh_merged |> 
    filter(tests_p_kid > 1) |>
    ggplot(aes(x = tests_p_kid)) +
    geom_histogram() +
    ggtitle(paste(state_abbr, " - Total cases: ", total_cases, " (", round(proportion_cases * 100, 2), "%)", sep = ""))
}

Iterate for all tract states:

for (state_abbr in tract_states) {
  p <- plot_tests_p_kid_tract(state_abbr)
  print(p)
}
## [1] "Loading OH from processed_data"
## Rows: 94292 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): tract, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading PA from processed_data"
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 43340 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): tract, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading CO from processed_data"
## Rows: 6245 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): tract, tested, BLL_geq_5, BLL_geq_10, state
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 5 rows containing non-finite values (`stat_bin()`).
## [1] "Loading MD from processed_data"
## Rows: 13304 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, county
## dbl (5): year, tract, tested, BLL_geq_5, BLL_geq_10
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading MA from processed_data"
## Rows: 16192 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): state, town, BLL_geq_5
## dbl (3): year, tested, tract
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading MN from processed_data"
## Rows: 15620 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): state, BLL_geq_10, BLL_geq_5
## dbl (5): tract, tested, start_year, end_year, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading NYC from processed_data"
## Rows: 20636 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (3): tract, year, n
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading NC from processed_data"
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 64328 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): tract, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading IN from processed_data"
## Rows: 16577 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (4): tract, year, tested, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading OR from processed_data"
## Rows: 15504 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): tested, BLL_geq_5, state
## dbl (2): tract, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading NH from processed_data"
## Rows: 3223 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, tract
## dbl (4): year, tested, BLL_geq_5, BLL_geq_10
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading WI from processed_data"
## Rows: 36564 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, county
## dbl (6): tract, start_year, end_year, tested, BLL_geq_5, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Same for ZIP states:

plot_tests_p_kid_zip <- function(state_abbr) {
  oh_merged <- single_state_zip(state_abbr, drop_if_multiple_testing_bool = FALSE)
  
  # Calculate total number and proportion of cases with tests_p_kid > 1
  total_cases <- sum(oh_merged$tests_p_kid > 1, na.rm = TRUE)
  proportion_cases <- total_cases / nrow(oh_merged)
  
  # Plot tests_p_kid for values above 1 and add total number and proportion of cases in title
  oh_merged |> 
    filter(tests_p_kid > 1) |>
    ggplot(aes(x = tests_p_kid)) +
    geom_histogram() +
    ggtitle(paste(state_abbr, " - Total cases: ", total_cases, " (", round(proportion_cases * 100, 2), "%)", sep = ""))
}

Iterate for all ZIP states:

for (state_abbr in zip_states) {
  tryCatch({
    p <- plot_tests_p_kid_zip(state_abbr)
    print(p)
  }, error = function(e) {
    message(paste("Error processing state:", state_abbr))
  })
}
## [1] "Loading AL from processed_data"
## Rows: 7146 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, zip
## dbl (4): year, tested, BLL_geq_10, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading AZ from processed_data"
## Rows: 2086 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): county, BLL_geq_10, BLL_5_9, BLL_leq_5, BLL_geq_5, tested, state
## dbl (2): year, zip
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `tested = as.numeric(str_remove(tested, "<"))`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run ]8;;ide:run:dplyr::last_dplyr_warnings()dplyr::last_dplyr_warnings()]8;; to see the 1 remaining warning.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading IL from processed_data"
## Rows: 16016 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (2): zip, year
## lgl (2): tested, BLL_geq_10
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `ell_10 = if (...) NULL`.
## ℹ In group 1: `state = "IL"`.
## Caused by warning in `min()`:
## ! no non-missing arguments to min; returning Inf
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "Additional features added: "
## Error processing state: IL
## [1] "Loading NY from processed_data"
## Rows: 20907 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): state, BLL_geq_5, tested
## dbl (2): year, zip
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading RI from processed_data"
## Rows: 1551 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, zip
## dbl (3): year, tested, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading LA from processed_data"
## Rows: 18986 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading NJ from processed_data"
## Rows: 45804 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (2): year, n
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading VT from processed_data"
## Rows: 7942 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading CA from processed_data"
## Rows: 546 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): County, state
## dbl (3): zip, year, BLL_geq_5
## num (1): tested
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading FL from processed_data"
## Rows: 8610 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, zip, tested, BLL_geq_10
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 24 rows containing non-finite values (`stat_bin()`).
## [1] "Loading IA from processed_data"
## [1] "Error loading IA file: 'ia.csv' does not exist in current working directory ('/Users/lasse/Library/Mobile Documents/com~apple~CloudDocs/Oxford MPhil/GRA Frank/lead_map/US/lead_data/reuters/processed_data')."
## [1] "Building IA from raw_data"
## [1] "File BLL_IO_Raw.xlsx already in local folder."
## Error processing state: IA
## [1] "Loading CT from processed_data"
## Rows: 7073 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): zip, state
## dbl (3): year, tested, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading SC from processed_data"
## Rows: 2611 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): tested, BLL_geq_5, BLL_geq_10, state
## dbl (2): year, zip
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading DC from processed_data"
## Rows: 24068 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (2): year, n
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading MI from processed_data"
## Rows: 23232 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (2): zip, year
## lgl (2): tested, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `ell_5 = if (...) NULL`.
## ℹ In group 1: `state = "MI"`.
## Caused by warning in `min()`:
## ! no non-missing arguments to min; returning Inf
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "Additional features added: "
## Error processing state: MI
## [1] "Loading GA from processed_data"
## Rows: 8778 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, zip
## dbl (4): BLL_leq_5, BLL_geq_5, year, tested
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading NM from processed_data"
## Rows: 2490 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (5): zip, year, tested, BLL_geq_5, BLL_geq_10
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading MO from processed_data"
## Rows: 6384 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, tested, BLL_geq_5, BLL_geq_10
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading OK from processed_data"
## Rows: 16808 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading TX from processed_data"
## Rows: 24495 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): zip, tested, BLL_geq_10, BLL_5_9, BLL_geq_5, state
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `BLL_geq_5 = if (has_bll_geq_5) as.numeric(str_remove(BLL_geq_5,
##   "<")) else NA`.
## Caused by warning:
## ! NAs introduced by coercion
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading TN from processed_data"
## Rows: 9999 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading VA from processed_data"
## Rows: 8316 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## [1] "Loading KS from processed_data"
## Rows: 6952 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Some of the ZIP states still have a lot of loss. Let’s investigate the number of cases by year.

get_yearly_barplot_zip <- function(state_abbr) {
  oh_merged <- single_state_zip(state_abbr, drop_if_multiple_testing_bool = FALSE)
  
  p <- oh_merged |>
    group_by(year) |>
    summarise(n_cases = n(), n_cases_tests_p_kid_gt_1 = sum(tests_p_kid > 1, na.rm = TRUE), n_cases_tests_p_kid_gt_1_prop = n_cases_tests_p_kid_gt_1 / n_cases) |>
    ggplot(aes(x = year, y = n_cases_tests_p_kid_gt_1_prop)) +
    geom_bar(stat = "identity") +
    # add vertical line at 2011
    geom_vline(xintercept = 2011, linetype = "dashed", color = "green") +
    ggtitle(state_abbr) +
    theme_minimal()
}

Iterate for all ZIP states:

for (state_abbr in zip_states) {
  tryCatch({
    p <- get_yearly_barplot_zip(state_abbr)
    print(p)
  }, error = function(e) {
    message(paste("Error processing state:", state_abbr))
  })
}
## [1] "Loading AL from processed_data"
## Rows: 7146 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, zip
## dbl (4): year, tested, BLL_geq_10, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "Additional features added: "
## [1] "Loading AZ from processed_data"
## Rows: 2086 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): county, BLL_geq_10, BLL_5_9, BLL_leq_5, BLL_geq_5, tested, state
## dbl (2): year, zip
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `tested = as.numeric(str_remove(tested, "<"))`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run ]8;;ide:run:dplyr::last_dplyr_warnings()dplyr::last_dplyr_warnings()]8;; to see the 1 remaining warning.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading IL from processed_data"
## Rows: 16016 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (2): zip, year
## lgl (2): tested, BLL_geq_10
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `ell_10 = if (...) NULL`.
## ℹ In group 1: `state = "IL"`.
## Caused by warning in `min()`:
## ! no non-missing arguments to min; returning Inf
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "Additional features added: "
## Error processing state: IL
## [1] "Loading NY from processed_data"
## Rows: 20907 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): state, BLL_geq_5, tested
## dbl (2): year, zip
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading RI from processed_data"
## Rows: 1551 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, zip
## dbl (3): year, tested, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading LA from processed_data"
## Rows: 18986 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading NJ from processed_data"
## Rows: 45804 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (2): year, n
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading VT from processed_data"
## Rows: 7942 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading CA from processed_data"
## Rows: 546 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): County, state
## dbl (3): zip, year, BLL_geq_5
## num (1): tested
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading FL from processed_data"
## Rows: 8610 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, zip, tested, BLL_geq_10
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading IA from processed_data"
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 19578 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, tested, BLL_geq_5, BLL_geq_10
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading CT from processed_data"
## Rows: 7073 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): zip, state
## dbl (3): year, tested, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading SC from processed_data"
## Rows: 2611 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): tested, BLL_geq_5, BLL_geq_10, state
## dbl (2): year, zip
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading DC from processed_data"
## Rows: 24068 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (2): year, n
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading MI from processed_data"
## Rows: 23232 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (2): zip, year
## lgl (2): tested, BLL_geq_5
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `ell_5 = if (...) NULL`.
## ℹ In group 1: `state = "MI"`.
## Caused by warning in `min()`:
## ! no non-missing arguments to min; returning Inf
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] "Additional features added: "
## Error processing state: MI
## [1] "Loading GA from processed_data"
## Rows: 8778 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, zip
## dbl (4): BLL_leq_5, BLL_geq_5, year, tested
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading NM from processed_data"
## Rows: 2490 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (5): zip, year, tested, BLL_geq_5, BLL_geq_10
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading MO from processed_data"
## Rows: 6384 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, tested, BLL_geq_5, BLL_geq_10
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading OK from processed_data"
## Rows: 16808 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading TX from processed_data"
## Rows: 24495 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): zip, tested, BLL_geq_10, BLL_5_9, BLL_geq_5, state
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `BLL_geq_5 = if (has_bll_geq_5) as.numeric(str_remove(BLL_geq_5,
##   "<")) else NA`.
## Caused by warning:
## ! NAs introduced by coercion
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading TN from processed_data"
## Rows: 9999 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): state, zip, BLL_geq_5, BLL_geq_10, tested
## dbl (1): year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading VA from processed_data"
## Rows: 8316 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "
## [1] "Loading KS from processed_data"
## Rows: 6952 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): state, BLL_geq_5, BLL_geq_10, tested
## dbl (2): zip, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] "Additional features added: "

The situation is worst in KS, VA, and TN. These are the states where lead and test counts are only reported in 4 year windows and are therefore slumped together, the preprocessing just distributes it equally which is most certainly wrong. The loss is slightly less bad but still significant in OK and TX. Two potential channels of explanation are: - the crosswalking of tracts to ZIPs of the kid count is not perfect, in the sense that the used weights underestimate the kid counts in the computed ZIPs. - Some states test more, and potentially more than once per kid. But this is certainly not correlated with our tract/ZIP categories, so there is no evidence of this being a driver here.